Statistical Analysis of Regularization Constant - From Bayes, MDL and NIC Points of View
نویسندگان
چکیده
In order to avoid over tting in neural learning, a regularization term is added to the loss function to be minimized. It is naturally derived from the Bayesian standpoint. The present paper studies how to determine the regularization constant from the points of view of the empirical Bayes approach, the maximum description length (MDL) approach, and the network information criterion (NIC) approach. The asymptotic statistical analysis is given to elucidate their di erences. These approaches are tightly connected with the method of model selection. The superiority of the NIC is shown from this analysis.
منابع مشابه
The meaning of place, A constant or changing quality? Lynch,Rapoport and Semiotics view points
The matter of meaning in place, is one of the main qualities of human life. People consciously or unconsciously looking for meanings in places. The importance of finding the meaning of place is that, Understanding the meaning will lead to “act” in place. Finding the place friendly, or finding it insecure will lead to act differently. Now the question is that, is the meaning of place, something ...
متن کاملCharacterization of the Bayes estimator and the MDL estimator for exponential families
We analyze the relationship between a Minimum Description Length (MDL) estimator (posterior mode) and a Bayes estimator for exponential families. We show the following results concerning these estimators: a) Both the Bayes estimator with Jeffreys prior and the MDL estimator with the uniform prior with respect to the expectation parameter are nearly equivalent to a bias-corrected maximum-likelih...
متن کاملComputing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
One of the defining properties of deep learning is that models are chosen to have many more parameters than available training data. In light of this capacity for overfitting, it is remarkable that simple algorithms like SGD reliably return solutions with low test error. One roadblock to explaining these phenomena in terms of implicit regularization, structural properties of the solution, and/o...
متن کاملMinimum Description Length Principle
The minimum description length (MDL) principle states that one should prefer the model that yields the shortest description of the data when the complexity of the model itself is also accounted for. MDL provides a versatile approach to statistical modeling. It is applicable to model selection and regularization. Modern versions of MDL lead to robust methods that are well suited for choosing an ...
متن کاملSafe Learning: bridging the gap between Bayes, MDL and statistical learning theory via empirical convexity
We extend Bayesian MAP and Minimum Description Length (MDL) learning by testing whether the data can be substantially more compressed by a mixture of the MDL/MAP distribution with another element of the model, and adjusting the learning rate if this is the case. While standard Bayes and MDL can fail to converge if the model is wrong, the resulting “safe” estimator continues to achieve good rate...
متن کامل